[benchmark] add option to enable CompiledAutograd #1536

crcrpar · 2024-12-10T16:33:57Z

What does this PR do?

CompiledAutograd seems to speed up FSDP2 and I checked it with torchtitan.
I however somehow do not find it beneficial for litgpt models.

setting: pjnl-20241209, 8H100

torchtitan Llama-3-8B

This uses activation checkpointing, since the provided config by default uses it -- https://github.com/pytorch/torchtitan/blob/05a8b5e4c1de979c4b49ff36e6b09d6055db29b1/train_configs/llama3_8b.toml#L53-L55

CompiledAutograd	Performance (tps)	Memory (GB)
N	6244	51.2
Y	7200	43.0

litgpt llama-2-7b-hf

CompiledAutograd	Performance (tokens/s/GPU)	Memory (GB)
N	11722.76	39.13
Y	10702.33	52.61

Signed-off-by: Masaki Kozuki <[email protected]>

IvanYashchuk · 2025-01-31T14:54:20Z

Have you investigated further why this option helps the torchtitan model code and why there are no improvements here?

crcrpar · 2025-02-03T16:14:50Z

no, I haven't had enough bandwidth for it

riccardofelluga · 2025-02-10T12:28:07Z

@IvanYashchuk Have you investigated further why this option helps the torchtitan model code and why there are no improvements here?

It looks like torchtitan uses compiled_autograd only for ddp and not fsdp. In this PR where they added the compiled_autograd option, they did it only for ddp. While, from the comments they seem confident that compiled autograd brings an advantage for ddp, they don't seem to be as optimistic for fsdp.

Here implemented the ddp compiled_autograd option: https://github.com/pytorch/torchtitan/blob/49c6d6fc15ef644e5c3b1003ad4e0d9ea5fcb9a9/torchtitan/parallelisms/parallelize_llama.py#L105-L110

IvanYashchuk · 2025-02-10T12:54:51Z

Does the all reduce operation appear in the FX graph with DDP + compiled_autograd?

add option to enable CompiledAutograd

b772a75

Signed-off-by: Masaki Kozuki <[email protected]>

crcrpar force-pushed the crpa/litgptbench_compiledautograd branch from fa70057 to b772a75 Compare January 6, 2025 05:50

Merge branch 'main' into crpa/litgptbench_compiledautograd

ed9c92e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[benchmark] add option to enable CompiledAutograd #1536

[benchmark] add option to enable CompiledAutograd #1536

crcrpar commented Dec 10, 2024 •

edited

Loading

IvanYashchuk commented Jan 31, 2025

crcrpar commented Feb 3, 2025

riccardofelluga commented Feb 10, 2025

IvanYashchuk commented Feb 10, 2025

[benchmark] add option to enable CompiledAutograd #1536

Are you sure you want to change the base?

[benchmark] add option to enable CompiledAutograd #1536

Conversation

crcrpar commented Dec 10, 2024 • edited Loading

What does this PR do?

torchtitan Llama-3-8B

litgpt llama-2-7b-hf

IvanYashchuk commented Jan 31, 2025

crcrpar commented Feb 3, 2025

riccardofelluga commented Feb 10, 2025

IvanYashchuk commented Feb 10, 2025

crcrpar commented Dec 10, 2024 •

edited

Loading